Learning Translation Rules from Bilingual English - Filipino Corpus

نویسندگان

  • Michelle Wendy Tan
  • Raymond Joseph Ang
  • Natasja Gail Bautista
  • Ya Rong Cai
  • Bianca Tanlo
چکیده

Most machine translators are implemented using example based, rule based, and statistical approaches. However, each of these paradigms has its drawbacks. Example based and statistical based approaches are domain specific and requires a large database of examples to produce accurate translation results. Although rule based approach is known to produce high quality translations, a linguist is necessary in deriving the set of rules to be used. To address these problems, we present an approach that uses the rule based approach in translating from English to Filipino text. It incorporates learning of rules based on the analysis of a bilingual corpus in an attempt to eliminate the need for a linguist. The learning algorithm is based on seeded version space learning algorithm as presented by Probst (2002). Implementation of the algorithm has been modified to allow learning of non-lexically aligned languages and to adapt to the complex free word order of the Filipino language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule Extraction Applied in Language Translation

Machine translation (MT) has been used to address inherent problems from human translators. However, the quality of machine translations are usually unacceptable. Researches have focused on improving quality by incorporating machine learning for translation. An example of which is TWiRL which translates English to Filipino sentences. However, TWiRL’s approach presented a strict requirement of a...

متن کامل

Learning Translation Rules for a Bidirectional English-Filipino Machine Translator

Filipino is a changing language that poses several challenges. Our goal is to develop a bidirectional English-Filipino Machine Translation (MT) system using a hybrid approach to learn rules from examples. The first phase was an English to Filipino MT system that required several language resources. The problem lies on its dependency over the annotated grammar which is currently unavailable for ...

متن کامل

Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolingual evidence to construct features that rescore existing translation candidates. In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. The proposed technique first constructs ph...

متن کامل

TExt Translation: Template Extraction for a Bidirectional English-Filipino Example-Based Machine Translation

In this paper, we present TExt Translation, a bidirectional English-Filipino Example-based Machine Translation System that learns and uses templates. These templates are used for translating English input text into Filipino and vice versa. Minimal language resources and information are used since these resources are few and may contain errors. The system uses an untagged bilingual corpus, lexic...

متن کامل

Building A Training Corpus For Word Sense Disambiguation In English-To-Vietnamese Machine Translation

The most difficult task in machine translation is the elimination of ambiguity in human languages. A certain word in English as well as Vietnamese often has different meanings which depend on their syntactical position in the sentence and the actual context. In order to solve this ambiguation, formerly, people used to resort to many hand-coded rules. Nevertheless, manually building these rules ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005